download compression is too slow #727

fengelniederhammer · 2024-04-03T12:59:17Z

resolves #717

Tests on a medium-sized data set (3552 sequences):
I measured request times of /sample/unalignedNucleotideSequences:

uncompressed: 1300 ms, 65 MB
gzip compressed: 3100 ms (before: 14 s), 8 MB
zstd compressed: 970 ms (before: 7.4 s), 342 kB

I tested this using the Ebola dataset from Loculus. I pushed it to a branch: https://github.com/GenSpectrum/LAPIS/tree/ebolaTestData

PR Checklist

~~[ ] All necessary documentation has been adapted.~~
~~[ ] The implemented feature is covered by an appropriate test.~~

and reduce logging noise from data version checker

Tests on a medium-sized data set (3552 sequences): I measured request times of /sample/unalignedNucleotideSequences: - uncompressed: 1300 ms, 65 MB - gzip compressed: 3100 ms (before: 14 s), 8 MB - zstd compressed: 970 ms (before: 7.4 s), 342 kB

chaoran-chen

Cool! What was the issue before? (I don't really understand what the changed code is doing.)

fengelniederhammer · 2024-04-03T13:09:45Z

lapis2/src/main/kotlin/org/genspectrum/lapis/controller/CompressionFilter.kt

+    override fun write(bytes: ByteArray) {
+        compressingStream.write(bytes)
+    }
+
+    override fun write(
+        bytes: ByteArray,
+        offset: Int,
+        length: Int,
+    ) {
+        compressingStream.write(bytes, offset, length)
+    }


@chaoran-chen Those are the relevant changes. We didn't forward those methods to the compressing stream. Instead, the default implementation would call fun write(byte: Int) for every entry of the ByteArray. Apparently the compression streams can do a lot better than writing every byte individually.

JonasKellerer

LGTM (works on my machine :) )

fengelniederhammer added 2 commits April 3, 2024 14:34

feat: log whether request was cached #717

20cfe15

and reduce logging noise from data version checker

fix: speed up compressing responses #717

202e3fa

Tests on a medium-sized data set (3552 sequences): I measured request times of /sample/unalignedNucleotideSequences: - uncompressed: 1300 ms, 65 MB - gzip compressed: 3100 ms (before: 14 s), 8 MB - zstd compressed: 970 ms (before: 7.4 s), 342 kB

fengelniederhammer requested a review from JonasKellerer April 3, 2024 12:59

chaoran-chen reviewed Apr 3, 2024

View reviewed changes

fengelniederhammer commented Apr 3, 2024

View reviewed changes

JonasKellerer approved these changes Apr 3, 2024

View reviewed changes

fengelniederhammer merged commit 1624580 into main Apr 3, 2024
10 checks passed

fengelniederhammer deleted the 717-download-compression-is-too-slow branch April 3, 2024 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

download compression is too slow #727

download compression is too slow #727

fengelniederhammer commented Apr 3, 2024 •

edited

Loading

chaoran-chen left a comment

fengelniederhammer Apr 3, 2024

JonasKellerer left a comment

download compression is too slow #727

download compression is too slow #727

Conversation

fengelniederhammer commented Apr 3, 2024 • edited Loading

PR Checklist

chaoran-chen left a comment

Choose a reason for hiding this comment

fengelniederhammer Apr 3, 2024

Choose a reason for hiding this comment

JonasKellerer left a comment

Choose a reason for hiding this comment

fengelniederhammer commented Apr 3, 2024 •

edited

Loading